智能论文笔记

TBI-GAN: An Adversarial Learning Approach for Data Synthesis on Traumatic Brain Segmentation

Xiangyu Zhao , Di Zang , Sheng Wang , Zhenrong Shen , Kai Xuan , Zeyu Wei , Zhe Wang , Ruizhe Zheng , Xuehai Wu , Zheren Li

分类：计算机视觉

2022-08-12

创伤性脑损伤（TBI）患者的脑网络分析对于其意识水平评估和预后评估至关重要，这需要分割某些意识相关的大脑区域。但是，由于很难收集TBI患者的手动注释的MR扫描，因此很难构建TBI分割模型。数据增强技术可用于缓解数据稀缺问题。但是，常规数据增强策略（例如空间和强度转化）无法模仿创伤性大脑中的变形和病变，这限制了后续分割任务的性能。为了解决这些问题，我们提出了一种名为TBIGA的新型医学图像授课模型，以通过配对的脑标签图合成TBI MR扫描。我们的TBIGAN方法的主要优势在于，它可以同时生成TBI图像和相应的标签映射，这在以前的医学图像的先前涂上方法中尚未实现。我们首先按照粗到细节的方式在边缘信息的指导下生成成分的图像，然后将合成强度图像用作标签上填充的先验。此外，我们引入了基于注册的模板增强管道，以增加合成图像对的多样性并增强数据增强能力。实验结果表明，提出的TBIGAN方法可以产生具有高质量和有效标签图的足够合成的TBI图像，这可以大大改善与替代方案相比的2D和3D创伤性脑部分割性能。

translated by 谷歌翻译

PAN: Pulse Ansatz on NISQ Machines

Zhiding Liang , Jinglei Cheng , Hang Ren , Hanrui Wang , Fei Hua , Yongshan Ding , Fred Chong , Song Han , Yiyu Shi , Xuehai Qian

分类：机器学习

2022-08-02

变异量子算法（VQA）在NISQ时代表现出巨大的潜力。在VQA的工作流程中，Ansatz的参数迭代更新以近似所需的量子状态。我们已经看到了各种努力，以较少的大门起草更好的安萨兹。在量子计算机中，栅极Ansatz最终将转换为控制信号，例如TransMons上的微波脉冲。并且对照脉冲需要精心校准，以最大程度地减少误差（例如过度旋转和旋转）。在VQA的情况下，此过程将引入冗余，但是VQAS的变异性能自然可以通过更新幅度和频率参数来处理过度旋转和重组的问题。因此，我们提出了PAN，这是一种用于VQA的天然脉冲ANSATZ GENTARATOR框架。我们生成具有可训练参数用于振幅和频率的天然脉冲ansatz。在我们提出的锅中，我们正在调整参数脉冲，这些脉冲在NISQ计算机上得到了内在支持。考虑到本机 - 脉冲ANSATZ不符合参数迁移规则，我们需要部署非级别优化器。为了限制发送到优化器的参数数量，我们采用了一种生成本机 - 脉冲ANSATZ的渐进式方式。实验是在模拟器和量子设备上进行的，以验证我们的方法。当在NISQ机器上采用时，PAN获得的延迟平均提高了86％。 PAN在H2和HEH+上的VQE任务分别能够达到99.336％和96.482％的精度，即使NISQ机器中有很大的噪声。

translated by 谷歌翻译

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-view Images

Kun Zhao , Qian Gao , Siyuan Hao , Jie Sun , Lijian Zhou

分类：计算机视觉 | 人工智能

2023-01-02

Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.

translated by 谷歌翻译

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Qinghao Ye , Guohai Xu , Ming Yan , Haiyang Xu , Qi Qian , Ji Zhang , Fei Huang

分类：计算机视觉 | 自然语言处理

2022-12-30

Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. Specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representation. Besides, the inherent temporal relations are captured by aligning video-text pairs as a whole in different time resolutions with multi-modal temporal relation exploration task. Furthermore, we introduce the shuffling test to evaluate the temporal reliance of datasets and video-language pre-training models. We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e.g., SSv2-Template and SSv2-Label) with 8.6% and 11.1% improvement respectively. HiTeA also demonstrates strong generalization ability when directly transferred to downstream tasks in a zero-shot manner. Models and demo will be available on ModelScope.

translated by 谷歌翻译

Exploring Depth Information for Face Manipulation Detection

Haoyue Wang , Meiling Li , Sheng Li , Zhenxing Qian , Xinpeng Zhang

分类：计算机视觉

2022-12-29

Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as the face recognition or face detection, is unfortunately paid little attention to in literature for detecting the manipulated face images. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information to tackle the problem of face manipulation detection in real world applications. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from a RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.

translated by 谷歌翻译

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Jian Cao , Chen Qian , Yihui Huang , Dicheng Chen , Yuncheng Gao , Jiyang Dong , Di Guo , Xiaobo Qu

分类：机器学习

2022-12-29

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

translated by 谷歌翻译

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Chengzhi Zhang , Yi Xiang , Wenke Hao , Zhicheng Li , Yuchen Qian , Yuzhuo Wang

分类：自然语言处理

2022-12-28

Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to locate future work sentences more accurately and quickly and reduce the time and cost of acquiring the corpus. The current work on automatic identification of future work sentences is relatively small, and the existing research cannot accurately identify FWS from academic papers, and thus cannot conduct data mining on a large scale. Furthermore, there are many aspects to the content of future work, and the subdivision of the content is conducive to the analysis of specific development directions. In this paper, Nature Language Processing (NLP) is used as a case study, and FWS are extracted from academic papers and classified into different types. We manually build an annotated corpus with six different types of FWS. Then, automatic recognition and classification of FWS are implemented using machine learning models, and the performance of these models is compared based on the evaluation metrics. The results show that the Bernoulli Bayesian model has the best performance in the automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT model has the best performance in the automatic classification task, with the weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and gain a deep understanding of the key content described in FWS, and we also demonstrate that content determination in FWS will be reflected in the subsequent research work by measuring the similarity between future work sentences and the abstracts.

translated by 谷歌翻译

MC-Nonlocal-PINNs: handling nonlocal operators in PINNs via Monte Carlo sampling

Xiaodong Feng , Yue Qian , Wanfang Shen

分类：机器学习

2022-12-26

We propose, Monte Carlo Nonlocal physics-informed neural networks (MC-Nonlocal-PINNs), which is a generalization of MC-fPINNs in \cite{guo2022monte}, for solving general nonlocal models such as integral equations and nonlocal PDEs. Similar as in MC-fPINNs, our MC-Nonlocal-PINNs handle the nonlocal operators in a Monte Carlo way, resulting in a very stable approach for high dimensional problems. We present a variety of test problems, including high dimensional Volterra type integral equations, hypersingular integral equations and nonlocal PDEs, to demonstrate the effectiveness of our approach.

translated by 谷歌翻译

A Manipulator-Assisted Multiple UAV Landing System for USV Subject to Disturbance

Ruoyu Xu , Chongfeng Liu , Zhongzhong Cao , Yuquan Wang , Huihuan Qian

分类：机器人

2022-12-23

Marine waves significantly disturb the unmanned surface vehicle (USV) motion. An unmanned aerial vehicle (UAV) can hardly land on a USV that undergoes irregular motion. An oversized landing platform is usually necessary to guarantee the landing safety, which limits the number of UAVs that can be carried. We propose a landing system assisted by tether and robot manipulation. The system can land multiple UAVs without increasing the USV's size. An MPC controller stabilizes the end-effector and tracks the UAVs, and an adaptive estimator addresses the disturbance caused by the base motion. The working strategy of the system is designed to plan the motion of each device. We have validated the manipulator controller through simulations and well-controlled indoor experiments. During the field tests, the proposed system caught and placed the UAVs when the disturbed USV roll range was approximately 12 degrees.

translated by 谷歌翻译

What Makes for Good Tokenizers in Vision Transformer?

Shengju Qian , Yi Zhu , Wenbo Li , Mu Li , Jiaya Jia

分类：计算机视觉

2022-12-21

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm. Relying on the tokenization process that splits inputs into multiple tokens, transformers are capable of extracting their pairwise relationships using self-attention. While being the stemming building block of transformers, what makes for a good tokenizer has not been well understood in computer vision. In this work, we investigate this uncharted problem from an information trade-off perspective. In addition to unifying and understanding existing structural modifications, our derivation leads to better design strategies for vision tokenizers. The proposed Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization. Furthermore, a regularization objective TokenProp is embraced in the standard training regime. Through extensive experiments on various transformer architectures, we observe both improved performance and intriguing properties of these two plug-and-play designs with negligible computational overhead. These observations further indicate the importance of the commonly-omitted designs of tokenizers in vision transformer.

translated by 谷歌翻译